arXiv:1502.07363vl [cond-mat.stat-mech] 19 Jan 2015 


Entropy of finite random binary sequences with weak long-range correlations 

S. S. Melnik* and O. V. Usatenko 
A. Ya. Usikov Institute for Radiophysics and Electronics 
Ukrainian Academy of Science, 12 Proskura Street, 61805 Kharkov, Ukraine 

We study the A^-step binary stationary ergodic Markov chain and analyze its differential entropy. 
Supposiirg that the correlations are weak we express the conditional probability function of the 
chain through the pair correlation function and represent the entropy as a functional of the pair 
correlator. Since the model uses the two-point correlators instead of the block probability, it makes it 
possible to calculate the entropy of strings at much longer distances than using standard methods. 

A fluctuation contribution to the entropy due to finiteness of random chains is examined. This 
contribution can be of the same order as its regular part even at the relatively short lengths of 
subsequences. A self-similar structure of entropy with respect to the decimation transformations is 
revealed for some specific forms of the pair correlation function. Application of the theory to the 
DNA sequence of the R3 chromosome of Drosophila melanogaster is presented. 


PACS numbers: 05.40.-a, 02.50.Ga, 87.10+e 

I. INTRODUCTION 

At present there is a commonly accepted viewpoint 
that our world is complex and correlated. The most 
peculiar manifestations of this concept are the records 
of brain activity and heart beats, human and animal 
communication, written texts of natural languages, DNA 
and protein sequences, data flows in computer networks, 
stock indexes, sun activity, weather (the chaotic nature of 
the atmosphere), etc. For this reason systems with long- 
range interactions (and/or with long-range memory) and 
natural sequences with non-trivial information content 
have been the focus of a large number of studies in dif¬ 
ferent fields of science over the past several decades. 

Random sequences with finite number of states exist 
as natural sequences (DNA or natural texts) or arise as a 
result of coarse-grained mapping of the evolution of the 
chaotic dynamic system into a string of symbols 
Such sequences are very closely connected to and are 
the subject of study of the algorithmic (Kolmogorov- 
Solomonoff-Chaitin) complexity, artificial intellect, in¬ 
formation theory, compressibility of digital data, sta¬ 
tistical inference problem, computability @ and have 
many application aspects as a creative tool for design¬ 
ing the devices and appliances with random components 
in their structure Q (different wave-filters, diffraction 
gratings, artificial materials, antennas, converters, delay 
lines, etc.). 

There are many methods for describing complex dy¬ 
namical systems and random sequences connected with 
them: correlation function, fractal dimensions, multi¬ 
point probability distribution functions, and many oth¬ 
ers. One of the most convenient characteristics serv¬ 
ing to the purpose of studying complex dynamics is en¬ 
tropy an- Being a measure of the information content 
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and redundancy in a sequence of data, it is a powerful 
and popular tool in examination of complexity phenom¬ 
ena. It has been used for the analysis of a number of 
different dynamical systems. 

A standard method of understanding and describing 
statistical properties of real physical systems or random 
sequences of data can be represented as follows. First of 
all, we have to analyze the sequence to find the corre¬ 
lation functions or the probabilities of words occurring, 
with the length L exceeding the correlation length Rc but 
being shorter than the length M of the sequence, 

Rc<L<s:M. (1) 

At the same time, the number of different words of the 
length L composed in the alphabet containing d letters 
has to be much less than the number M — L oi words in 
the sequence, 

d^ <.M - Lk. M. (2) 

The next step is to express the correlation properties of 
the sequence in terms of the conditional probability func¬ 
tion (CPF) of the Markov chain, see below Eq. ([S]). Note, 
the Markov chain should be of order N, which is supposed 
to be longer than the correlation length, 

R, < N. (3) 

This is the critical requirement because the correlation 
length of natural sequence of interest (e.g., written or 
DNA texts) is usually of the same order as the length of 
sequences. None of inequalities 0-0 can be fulfilled. 
Really, the lengths of words that could represent correctly 
the probability of words occurring are 4 — 5 letters for a 
real natural text of the length 10® (written on an alphabet 
containing 27 — 30 letters and symbols) or of order of 20 
symbols for a coarse-grained text represented by means 
of a binary sequence. 

Here we develop an approach that is complimentary 
to the above exposed. In particular, we represent the 
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conditional probability function of the Markov chain by 
means of pair correlator, which makes it possible to cal¬ 
culate analytically the entropy of the sequence. It should 
be stressed that the standard method for calculating the 
entropy can only take into account the short-range part of 
statistics. We present a theory that expresses long-range 
correlation properties through the correlation functions. 

The scope of the paper is as follows. First, we dis¬ 
cuss briefly the properties of the fV-step additive Markov 
chain model and, supposing that the correlations between 
symbols in the sequence are weak, we express the con¬ 
ditional probability function by means of the pair cor¬ 
relation function. In the next section we represent the 
differential entropy in terms of the conditional probabil¬ 
ity function of the Markov chain and express the entropy 
as the sum of squares of the pair correlators. Then we 
discuss some properties of the results obtained, in partic¬ 
ular, a property of self-similarity of entropy with respect 
to decimation for some particular classes of the Markov 
chains. Next, a fluctuation contribution to the entropy 
due to finiteness of random chains is examined. Some 
remarks on literary texts are followed by discussions of 
directions in which the research can be progressed. 


II. ADDITIVE MARKOV CHAINS 

This section includes mainly introductory material. 
Some authors’ results presented by Eqs. were 

exposed earlier in Ref. 

Consider a semi-infinite sequence A = = 

ag, ai, 02 , ■■■ of real random variables ai taken from the 
finite alphabet A = {1, 2,..., d}, Oi G A. The sequence A 
is an N-step Markov chain if it possesses the following 
property: the probability of symbol Oi to have a certain 
value a under the condition that the values of all other 
symbols are given depends only on the values of N pre¬ 
vious symbols. 


P{ai = a|.. .,ai-2,ai-i) 

= P{ai = a\ai-N,-■ ■ ,ai-2,ai-i). (4) 

Note, definition o is valid for i > N] ior i < N 
we have to use the well known conditions of compat¬ 
ibility for the conditional probability functions of the 
lower order. Ref. Sometimes the number N is 

also referred to as the order or the memory length of 
the Markov chain. The conditional probability function 
(CPF) P{ai = a\ai-]s[,... ,ai- 2 ,ai-i) determines com¬ 
pletely all statistical properties of the Markov chain and 
the method of their iterative numerical construction. If 
the sequence, statistical properties of which we would 
like to analyze is assigned, the conditional probability 
function of the iV-th order can be found by a standard 
method. 


P{ai,... ,aN,a) 


where P{ai, ..., on) is the probability of the iV-words 
oi,..., ojv occurring. 

The Markov chain determined by Eq. is a homoge¬ 
neous sequence because its conditional probability does 
not depend explicitly on i, i.e., is independent of the po¬ 
sition of symbols Oi-N, ■ ■ ■ in the chain. It de¬ 

pends only on the values of symbols Oi-N,... ,ai-i,ai. 
The homogeneous sequences are stationary: the average 
value of any function /(a^j, arj+r 2 , ■ ■ ■, o,ri-e...+rs ) of s ar¬ 
guments 

f {Uri, ■ ■ ■ , ari + ...+rs) ( 6 ) 


1 


M-1 


~ , 1 ™ ]\/r ^ / f i^i+rn ■ ■ ■ : O-i+n-e.-.+rs) 

M—>oo IVl ^' 


i=0 


depends on s — 1 differences between the indexes. In 
other words, all statistically averaged functions of ran¬ 
dom variables are shift-invariant. 

We suppose that the chain is ergodic. According to 
the Markov theorem (see, e.g., Ref. @), this property 
is valid for the homogenous Markov chains if the strict 
inequalities, 

0 < P(a,+jv = < 1, i € N+ = {0,1,...} (7) 


are fulfilled for all possible values of the arguments in 
function (g]). Hereafter we use the shorter notation alz]^ 
for iV-word Oi-N, Oi-i- It follows from ergodicity that 
the correlations between any blocks of symbols in the 
chain go to zero when the distance between them goes 
to infinity. The other consequence of ergodicity is the 
possibility to use one random sequence as an equitable 
representative of the ensemble of chains and to do aver¬ 
aging over the sequence, Eq. instead of an ensemble 
averaging. 

Below we will consider an important class of the binary 
random sequences with symbols Oi taking on two values, 
say 0 and 1, Ui € {0,1}. The conditional probability to 
find i-th element = 1 in the binary iV-step Markov 
sequence depending on N preceding elements is a 

set of 2^ numbers: 


P(l|a):]v) = 

P(0|a):]v) = l-P(l|al:]v)- (8) 

Conditional probability ® of the binary sequence of ran¬ 
dom variables Oi G (0,1} can be represented exactly as 
a finite polynomial series: 

N 

P(l|ar]v) = a + X! - a) 

N 

+ P2(*"i, *"2)(a 

ri,r2 — l 
N 

+ PNifi,... ,rN){a i—n ■ ■ • Oji—rjsi 

ri,...,rAr = l 

; 


P{aN+i = a|ai,...,a7v) 


(9) 
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where the statistical averages are taken over 

sequence Fg is the family of memory functions and a 
is the relative average number of unities in the sequence. 
The representation of Eq. ([5|) in the form of Eq. ([H|) fol¬ 
lows from the simple identical equalities, a? = a and 
/(a) = a/(l) -I- (1 — a)/(0), for an arbitrary function 
/(a) determined on the set a G {0,1}. The first term 
in Eq. m is responsible for generation of uncorrelated 
white-noise sequences. Taking into account the second 
term, proportional to Ei(r), we can reproduce correctly 
correlation properties of the chain up to the second order. 
Higher-order correlators and all correlation properties of 
higher orders are not independent anymore. We cannot 
control them and reproduce correctly by means of the 
memory function E(r), because the latter is completely 
determined by the pair correlation function, see below 
Eq. dill) . Studying of the properties of these higher- 
order correlators is beyond the scope of this paper. In 
what follows we will only use the first two terms, which 
determine the so-called additive Markov chain [3, Q ■ 

A particular form of the conditional probability func¬ 
tion of additive Markov chain is the chain with step-wise 
memory function, 

P[l\k) = \ + y.(^±-iy ( 10 ) 

The probability P{l\k) of having the symbol Uj = 1 after 
Wword containing k unities, k = is a 

linear function of k and is independent of the arrange¬ 
ment of symbols in the word The parameter /i 

characterizes the strength of correlations in the system. 

There is a rather simple relation between the memory 
function F{r) (hereafter we will omit the subscript 1 of 
Fi (r)) and the pair correlation function of the binary ad¬ 
ditive Markov chain. There were suggested two methods 
for finding the F{r) of a sequence with a known pair cor¬ 
relation function. The first one [ 3 ] is based on the mini¬ 
mization of a “distance” between the Markov chain gen¬ 
erated by means of the sought-for memory function and 
the initial given sequence of symbols with a known cor¬ 
relation function. The minimization equation yields the 
relationship between the correlation and memory func¬ 
tions, 

N 

K{r) = F{r')K{r — r'), r ^ 1. (11) 

r’ — l 

where the normalized correlation function K (r) is given 
by 

= C{r) = {a^-d){ai+r-d). (12) 

The second method for deriving Eq. m is the com¬ 
pletely probabilistic straightforward calculation Q. 

Equation (El), despite its simplicity, can be analyti¬ 
cally solved only in some particular cases: for one- or 
two-step chains, Markov chain with step-wise memory 


function and so on. To avoid the difficulties in solving 
Eq. (ITT]) we suppose that correlations in the sequence are 
weak. This means that all components of the normalized 
correlation function are small, \K{r)\ <C 1, |r| ^ 0, with 
the exception of K{Q) = 1. So, taking into account that 
in the sum of Eq. ED the leading term is K{0) = 1 and 
all the others are small, we can obtain an approximate 
solution for the memory function in the form of the series 

N 

F{r) = K{r) -'^K{r- r')K{r') (13) 

r'^r 

N N 

+ '^ K{r - r')K{r' - r")K{r") + ... 

r'^r 

The equation for the conditional probability function in 
the first approximation with respect to the small func¬ 
tions |Ar(r)| <C 1, |r| 7 ^ 0 takes the form 

N 

P{l\alz]^) ~ d + '^F{r){a,-r -d) 

r—1 

N 

~ d + '^K{r){ai-r - d). (14) 

r—1 

This formula provides a very important tool for con¬ 
structing a sequence with a given pair correlation func¬ 
tion. Note that i-independence of the function ^(lla*!)^) 
guarantees homogeneity and stationarity of the sequence 
under consideration; and finiteness of N provides its 
ergodicity. Evidently, we can only consider sequences 
with the correlation functions, determined by P(l|a)l}^), 
which satisfy Eq. (H)- 

The correlation functions are typically employed as 
the input characteristics for describing the random se¬ 
quences. However, the correlation function describes not 
only the direct interconnection of the elements and 
tti+r, but also takes into account their indirect interac¬ 
tion via all other intermediate elements. Our approach 
operates with the “origin” characteristics of the system, 
specifically, with the memory function. The correlation 
and memory functions are mutually complementary char¬ 
acteristics of a random sequence in the following sense. 
The numerical analysis of a given random sequence en¬ 
ables one to determine directly the correlation function 
rather than the memory function. On the other hand, 
it is possible to construct a random sequence using the 
memory function, but not the correlation one, in the gen¬ 
eral case. Therefore, the memory function permits one 
to get a deeper insight into the intrinsic properties of the 
correlated systems. Equation (1141) shows that in the limit 
of weak correlations both functions play the same role. 

The concept of the additive Markov chain was exten¬ 
sively used earlier for studying random sequences with 
long-range correlations. The examples and references can 
be found in Ref. Q. 
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III. DIFFERENTIAL ENTROPY 

In order to estimate the entropy of an infinite station¬ 
ary sequence A of symbols one could use the block 
entropy, 


Upon using Eq. (ED for averaging h(ai+i |af) and in view 
of (5 = 0, the differential entropy of the sequence becomes 

h, = j hL<N = Eti F\r), (21) 

t hL>N = hL=N- 


Hl = - ^(ai')log2-P(af)- (15) 

ai,...,aL 

Here ) = P{ai ,..., a^) is the probability to hnd the 
L-word of in the sequence. The differential entropy, or 
entropy per symbol, is given by 

hL = Hl+1 - Hl, (16) 

and specifies the degree of uncertainty of the {L + l)th 
symbols observing if the preceding L symbols are known. 
The source entropy (or Shannon entropy) is the differ¬ 
ential entropy at the asymptotic limit, h = lim/,_>oo Hl- 
This quantity measures the average information per sym¬ 
bol if all correlations, in the statistical sense, are taken 
into account. 

The differential entropy Hl can be presented in terms 
of the conditional probability function. To show this we 
have to rewrite Eq. (1X51) for the block of length L -I- 1, 
express P{ai^^) via the conditional probability, and after 
a bit of algebra we obtain 

fiL = ^ P{af)h{aL+i\af) = hiaL+ilaf). (17) 

Here h{aL+i\ai) is the amount of information contained 
in the {L + l)-th symbol of the sequence conditioned on 
L previous symbols, 

h{aL+i\a^)=- P(aL+i|af)log2P(aL+i|af).(18) 

aL+i=0,l 


So, the differential entropy of random sequence is pre- 
sented as a special case of the standard conditional en¬ 
tropy H = - Ec PiC) Eb PiB\C) log 2 PiB\C). 

The conditional probability P{l\al_\) at L < A^, 

L 

PiM^lZl) ^a + 6; 5 = y]E(r)(a*_r - d), (19) 

r— 1 

is obtained in the hrst approximation in the parameter 
S from Eq. (11X1) by means of the probabilistic reasoning 
presented in the Appendix. 

Taking into account the weakness of correlations, |(5| <?; 
min[d, (1 — d)], one can expand the right-hand side of 
Eq. ITSl) in Taylor series up to the second order in 6, 
h{aL+i\af) = ho + {dh/da)\s=oS+il/2){d‘^h/da‘^)ls=oS^, 
where the derivatives are taken at the “equilibrium 
point” P(l|a)~)^) = d and Iiq is the entropy of uncor¬ 
related sequence, 


( 20 ) 


If the length of block exceeds the memory length, L > N, 
the conditional probability P(l|a*l)^) depends only on N 
previous symbols, see Eq. O- Then, it is easy to show 
from ED that the differential entropy remains constant 
dX L> N. The second line of Eq. m is consistent with 
the first one because in the first approximation in S the 
correlation function vanishes a.t L > N together with the 
memory function. The hnal expression, the main result 
of the paper, for the differential entropy of the stationary 
ergodic binary weakly correlated random sequence is 


hL 


ho — 


1 

21n2 


r—1 


( 22 ) 


IV. DISCUSSION 

It follows from Eq. (|2^ that the additional correction 
to the entropy ho of the uncorrelated sequence is the neg¬ 
ative and monotonously decreasing function of L. This 
is the anticipated result — the correlations decrease the 
entropy. The conclusion is not sensitive to the sign of 
correlations: persistent correlations, K > 0, describing 
an “attraction” of the symbols of the same kind, and 
anti-persistent correlations, K < 0, corresponding to an 
attraction between “0” and “1”, provide the corrections 
of the same negative sign. If the correlation function is 
constant at I ^ r ^ iV, the entropy is a linear decreasing 
function of the argument L up to the point N ; the result 
is coincident with that obtained in Ref. 0 (in the limit 
of weak correlations) for the Markov chain model with 
step-wise memory function ED- 

As an illustration of result (ED, in Fig.n we present 
the plot of the differential entropy versus the length L. 
Both numerical and analytical results (the dotted and 
solid curves) are presented for the power-law correlation 
function K{r) = O.Ol/r^'^. The cut-off parameter of 
the power-law function for numerical generation of the se¬ 
quence, coinciding with the memory length of the chain, 
is 10^. The good agreement between the curves is the 
manifestation of adequateness of the additive Markov 
chain approach for studying entropy properties of ran¬ 
dom chains. The abrupt deviation of the dashed line from 
the upper analytical and numerical curves at L ~ 10 is 
the result of violation of inequality (E and a manifesta¬ 
tion of quickly growing errors in the entropy estimation 
by using the probability P(a \,..., a^) of the L blocks 
occurring. Note that violation of Eq.(2) does not depend 
on the choice of the model parameter. It only depends 
on the length M of the random sequence. 

In the main panel of Fig. [T]the deviation of numerical 
curve from analytical one is nearly absent. Nevertheless 


ho = -alog 2 (a) - (I - a) log 2 (I - a). 
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FIG. 1: The differential entropy vs the length of words. The 
solid line is the analytical result, Eq. (I22II . for the correlation 
function K{r) — 0.01/r^'^ and d = 1/2, whereas the dots 
correspond to the direct evaluations of the same Eq. for 
the numerically constructed sequence (of the length M = 10® 
and the cut-off parameter t'c = 10^) by means of conditional 
probability function la and the numerically evaluated corre¬ 
lation function K(r) of the constructed sequence. The dashed 
line is the differential entropy, Eqs. m and (Hi), plotted by 
using the numerical estimation of probability P(ai,... jCll) 
of the //-blocks occurring in the same sequence. The inset 
demonstrate the linear dependence of differential entropy at 
large L governed by fluctuations of the correlation function. 


after decimation Eq. (I?T]) does not change its form, but 
instead of N we have only to put the new memory length 
N*. 



FIG. 2: Differential entropy h vs length L for R3 chromosome 
of Drosophila melanogaster DNA of length M ~ 2.7 x 10^. 
The solid line is obtained by using Eq. (1221) with numerically 
evaluated correlation function Eq. m- The dashed line is 
the differential entropy, Eqs. HSl) and Hi), plotted by using 
the numerical estimation of probability P(ai,. .., ol) of the 
L blocks occurring in the same sequence. 


in the large scale, presented in the inset, a systematic 
linear deviation of numerical result from the analytical 
one is clearly seen. Explication of this phenomenon is 
given in the next section while discussing hnite random 
sequences. 

Our next illustration of applicability of the developed 
theory deals with the DNA sequence of the R3 chromo¬ 
some of Drosophila melanogaster. In Fig. [2] the plot of 
the differential entropy versus the length L is presented. 
We see that coincidence of the two approaches only holds 
for L < 5 — 6 units. It is difficult to do a single-valued 
conclusion of which factor, finiteness of the chain and 
violation of Eq. (ED or strength of correlations, is more 
important for discrepancy between two theories. Nev¬ 
ertheless, even observed coincidence between two curves 
seems rather astonishing. 

Markov’s chains with step-wise memory functions and 
a larger class of permutable chains are invariant under 
decimation procedure [111. Chains whose conditional 
probability functions are independent of the order of sym¬ 
bols in the N word preceding a generated symbol are re¬ 
ferred to as permutable. The decimation is a reduction of 
a random sequence by regular or random removing some 
part of symbols from the whole chain. It was shown in 
Refs, [iflllll that after decimation the correlation func¬ 
tion of indicated classes of sequences is invariant up to 
the new reduced memory length N* = XN, where A is the 
relative non removed part of symbols in the chain. Hence, 


V. FINITE RANDOM SEQUENCES 

The relative average number of unities d, correlation 
functions and other statistical characteristics of random 
sequences are deterministic quantities only in the limit of 
their infinite lengths. It is the direct consequence of the 
law of large numbers. If the length M of the sequence 
is finite, the set of numbers cannot be considered 
anymore as ergodic sequence. In order to restore its sta¬ 
tus we have to introduce an ensemble of finite sequences 
{a^}p,p G N = 0,1,2,.... However, we would like to 
retain the right to examine finite sequences even if ap¬ 
proximately by using a single finite chain. So, for a finite 
chain we have to replace definition (TT^ of the correlation 
function by the following one, 

^ M—r — 1 

CM{r) = — - V (ci - d)(a,+r - d), (23) 

M — r 

^ M-l 
i=0 

Now the correlation functions and d are random quan¬ 
tities depending on a particular realization of the se¬ 
quence . Their fluctuations can contribute to the en¬ 
tropy of finite random chains even if the correlations in 
the random sequence are absent. It is well known that the 
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order of relative fluctuations of additive random quantity 
(as, e.g. the correlation function Eq. (I^ i is l/y/M. 

Below we give more rigorous justification of this ex¬ 
planation and show its applicability to our case. Let us 
present the correlation function C'M(r) as the sum of two 
components, 


CM{r)=C{r)+Cf{r), (24) 

where the first summand C{r) = limM^oo CMir) i® 
correlation function determined by Eqs. m and (l23l) . 
obtained by averaging over the sequence with respect to 
the index i, numbering the elements of the sequence 
A; and the second one, Cf{r), is a fluctuation-dependent 
contribution. The function C(r) can be also presented as 
the ensemble average C{r) = {CM{r)) due to ergodicity 
of the sequence. 

Now we can find a connection between variances of 
Cm{i") and Cf{r). Taking into account that the corre¬ 
lations are weak and neglecting their contribution into 
Cf{r) we have 


{CI{t)) = C^{t) + {C%t)). (25) 

In order to obtain the last equation we used Eq. (l24l) 
and the property of the function (C/(r)) = 0 at r 0. 
The mean fluctuation of the squared correlation function 
C’j(r) is 


^ M—r—1 

= (M-rV ^ {{.an-a){an+r 

^ n,m=0 

X (Otti ^)(0'm-|-r ^))- 


a) 

(26) 


Neglecting correlations between the elements a„ and 
taking into account that only the terms with n = m give 
nonzero contribution into the result we easily obtain 




(Cjir)) 

CjiO) ’ 


{Kj{r)) 


1 _ 1 
M — r M 


(27) 


Note that Eq. (1271) is obtained by means of averaging 
over the ensemble of chains. This is the shortest way to 
obtain the desired result. At the same time, for numerical 
simulations we used only averaging over the chain as is 
seen from Eq. (1231) . where summation over the sites i of 
the chain plays the role of averaging. 

Note also that different symbols Oi in Eq. (l26l) are cor¬ 
related. It is possible to show that contribution of their 
correlations to {K'j{r)) is of order 1/M. 

The fluctuating part of entropy, proportional to 
should be subtracted from Eq. (1^ . which 
is only valid for the infinite chain. Thus, Eqs. (EID 
and (l27l) yield the differential entropy of the finite bi¬ 
nary weakly correlated random sequences 


hr, = hn — 


1 


21n2 


■ L 

-l0g2 

r—1 


M 


M-L 


(28) 


It is clear that in the limit M —>■ oo this function trans¬ 
forms into Eq. (USD- When L M the last term in 
Eq. (l2^ takes the form L/M and describes the linear 
decreasing entropy in the inset of Fig. [TJ 

The squared correlation function K’j^ir) is usually a 
decreasing function of r, whereas the function K‘j(r) is 

an increasing one. Hence, the terms 
log 2 [M/(M — L] being concave and convex functions de¬ 
scribe competitive contributions to the entropy. It is not 
possible to analyze all particular cases of their relation¬ 
ship. Therefore we indicate here the most interesting 
ones keeping in mind a monotonically decreasing corre¬ 
lation function. An example of such type of function, 
K{r) = ajr^, a > 0, 6 ^ 1, was considered above. 

If the correlations are extremely small and compared 
with the inverse length M of the sequence, ^ 

1/M, the fluctuating part of entropy exceeds the corre¬ 
lation one nearly for all values of L > 1. 



FIG. 3: The differential entropy vs the length of words. The 
solid line is the analytical result for the correlation function 
K{r) = 0.01/r^'^, whereas the dots correspond to the di¬ 
rect numerical evaluations Eq. dm for the numerically con¬ 
structed sequence of the length M = 10® and the cut-off pa¬ 
rameter Tc = 20. The dashed line is the differential entropy 
with fluctuation correction described by Eq. (1281) . 

With increasing of M (or correlations), when the in¬ 
equality Ar^(l) > 1/M is fulfilled, there is at list one 
point where the contribution of fluctuation and corre¬ 
lation parts of entropy are equal. For monotonically de¬ 
creasing function K(r) there is only one such point. Com¬ 
paring the functions in square brackets in Eqs. (1281) we 
find that they are equal at some L = Rs, which hereafter 
will be referred to as a stationarity length. If A <C i?s, 
the fluctuations of the correlation function are negligibly 
small with respect to its magnitude, hence the finite se¬ 
quence may be considered as quasi-stationary. At L ~ 
the fluctuations are of the same order as the genuine cor¬ 
relation function K^{r). Here we have to take into ac¬ 
count the fluctuation correction due to finiteness of the 
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random chain. At L > Rg the fluctuating contribution 
exceeds the correlation one. 

The other important parameter of the random se¬ 
quence is the memory length N. If the length N is 
less than Rg, we have no difficulties to calculate the 
entropy of finite sequence, which can be considered as 
quasi-stationary. This case is illustrated in Fig. [3] If the 
memory length exceeds the stationarity length Rg < N, 
we have to take into account the fluctuation correction 
to the entropy. 



FIG. 4: The differential entropy vs the length of words in 
I/-axis log scale. The solid and dotted curves are the same 
as in the main panel of Fig. [T] The dashed line corresponds 
to the direct evaluations of Eq. (|2^ for the sequence numer¬ 
ically constructed by means of Eq. m with fluctuation cor¬ 
rection (I28II and the cut-off parameter Vc — 10"*. The inset 
demonstrates the large L region for the sequence of the length 
M = 10®. 

In Fig. |4] the plot of the differential entropy versus the 
length of words is shown as an illustration of importance 
of this correction. Both numerical and analytical results 
are presented for the same power-law correlation function 
as in Figs. [T] and [H Comparing sums of squared corre¬ 
lation function K{r) = 0.01/r^-^ with contribution (E51) . 
proportional to log 2 [M/(M — i], we find that they are 
equal at Rg ~ 10'*. A graphical confirmation of this 
result is shown in the inset of Fig. ID We can conclude 
that the dashed lines better approximate theoretical solid 
curves than dotted lines (till L Ri 10^). The nonmono¬ 
tone decrease of h{L) is due to the fluctuation of random 
quantity, the entropy of the finite sequence. 


VI. APPLICATION TO WRITTEN TEXTS 

A theory of additive Markov chains with long-range 
memory was used for a description of correlation prop¬ 
erties of literary texts Q . The coarse-grained naturally 
written texts were shown to be strongly correlated se¬ 
quences that possess anti-persistent properties at small 


distances (in the region L < 300 of grammatical rules ac¬ 
tion) . At long distances (in the region L > 300 of seman¬ 
tic rules action) they manifest weak persistent power-law 
correlations. It is clear that our model of the additive 
Markov chain can only claim to describe the weak power- 
law part of entropy, proportional to L~'^. 

Ebeling and Nicolis [1^ and Schiirmann and Grass- 
berger [I3l | suggested the empirical form of entropy for 
written texts 

hL = h + c-^, 7>0. (29) 

There emerges a natural question about the origin of 
this dependence. The partial answer to the question is as 
follows. The entropy of the Markov chain with step-wise 
memory function m in the limit of strong correlations, 
iVlniV(l — 2/i) ^ 4^, was obtained in Ref. (l^ 

hL = h + J^^. (30) 

Jj 

After comparing the results of Eqs. (|2T|l and (|30)) with 
that of Eq. it becomes clear that the term log 2 L de¬ 
scribes strong short-range correlations and the power-law 
term L~"^ is responsible for weak long-range correlations. 
So, we need a combined model that could unify two ap¬ 
proaches of the additive Markov chain exposed above and 
the Markov chain with a step-wise memory function. 

The answer to the question of which part of the correla¬ 
tion or memory function is responsible for the decimation 
invariance is still unsolvable. 

VII. CONCLUSION AND PERSPECTIVES 

(i) The main result of the paper, the differential en¬ 
tropy of the stationary ergodic binary weakly correlated 
random sequence A is given by Eq. (l22ll . The other im¬ 
portant point of the work is the calculation of the fluc¬ 
tuation contribution to the entropy due to finiteness of 
random chains, the last term in Eq. (l28)) . 

(ii) In order to obtain Eq. (1221) we used an assump¬ 
tion that the random sequence of symbols is the Markov 
chain. Nevertheless, the final result contains only the cor¬ 
relation function, does not contain the conditional prob¬ 
ability function of the Markov chain. This allows us to 
suppose that result (1^ and the region of its applica¬ 
bility is wider than the assumptions under which it is 
obtained [T^ . 

(iii) To obtain Eq. (USD we have supposed that cor¬ 
relations in the random chain are weak. This is not 
a very severe restriction. Many examples of such sys¬ 
tems, described by means of the pair correlator are given 
in Ref. Q. The randomly chosen example of DNA se¬ 
quences supports this conclusion. The strongly corre¬ 
lated systems, which are opposed to weakly correlated 
chains, are nearly deterministic. For their description 
we need completely different approach. Their study is 
beyond the scope of this paper. 











(iv) The developed theory opens a way for constructing 
a more consistent and sophisticated approach describ¬ 
ing the systems with long-range correlations. Namely, 
Eq. (1^ can be considered as expansion of the entropy 
in series with respect to the small parameter 6, where 
the entropy ho of the non-correlated sequence is the zero 
approximation. Alternatively, for the zero approxima¬ 
tion we can use the exactly solvable model of the A^-step 
Markov chain with the conditional probability function of 
words occurring taken in the form of the step-wise func¬ 
tion, Eq. m- Another way to choose the zero approxi¬ 
mation can be based on CPF obtained from probability 
of the bloc occurring Eq. eg. 

(v) In this paper we have considered the random se¬ 
quences with the binary space of states, but almost all 
results can be generalized to non-binary sequences and 
can be applied for describing natural written and DNA 
texts. 

(vi) Our consideration can be generalized to the 
Markov chain with the infinite memory length N. In 
this case we have to impose a condition on the decreas¬ 
ing rate of the correlation function and the conditional 
probability function at A^ —>■ oo. 


Adding the symbol ai- 
have 








p(i|ai:]v+i) = 




P{o,i-N+l) 


(A2) 

Replacing here the probabilities P{ai-N, 1) with 

the CPF P{l\a,-N,atN+i) from equation similar to 
that of Eq. m, 

= ^ = (0,1), (AS) 

after some algebraic manipulations, we get 


(A4) 


N-1 

^’(l|a-l]v+i) = “ + X! P{r)iai-r - a) 

r—1 

+ 7^^ [(l-a)P(l,a):)^+i)-aP(0,a):)^+J] . 

From the compatibility condition for the Chapman- 
Kolmogorov equation (see, for example, Ref. Cl), 


R(a)_jv+i) = Pial_N),PN{a^\al_l,), (A5) 
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it follows that its solution is given by 

P{k) = a'^{l + 0{S). 


(A6) 


Appendix A: 

Here we prove Eq. dUl) using Eq. da as a starting 
point. It follows from definition ([S]) of the conditional 
probability function 


Pa\atN+i) 


P{<^i-N+1^ 1 ) 

Pi^i-N+l) 


(Al) 


Here P{k) is the probability to have k units and {N — k) 
zeros at fixed sites of the A-word. Therefore, the 
last term in the square brackets of Eq. dH) van¬ 
ishes in the main approximation, so that the difference 
[(1 - a)P(l,a*l]v+i) - aP(0,a)l]v_^J] is of order of S. 
Hence, we have to neglect the third term in the right- 
hand side of Eq. (lAIt because it is of the second order 
in 6. So, Eq. (ITO)) is proven for L = A — 1. By induction 
the equation can be written for arbitrary L. 
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